Studying the Performance of the Jellyfish Search Optimiser for the Application of Projection Pursuit

H. Sherry Zhang

University of Texas at Austin

2024-09-04

Table of content

  • Background: Projection pursuit (PP) and optimisation
  • The Jellyfish Search Optimiser (JSO)
  • Two new metrics, smoothness and squintability, to quantify PP optimisation
  • Simulation study

Optimisation in projection pursuit

  • Data: \(\mathbf{X}_{n \times p}\); Basis: \(\mathbf{A}_{p\times d}\)
  • Projection: \(\mathbf{Y} = \mathbf{X} \cdot \mathbf{A}\)
  • Index function: \(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)
  • Optimisation: \(\arg \max_{\mathbf{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d\)
  • 5 vars (\(x_1\) - \(x_5\)), 1000 obs simulated
    • One variable (\(x_2\)) is a mixture normal
    • others are random normal
  • 1D projection using the holes index: \(\propto 1 -\frac{1}{n} \sum_{i = 1}^n \exp(-\frac{1}{2} y_i y_i^{\prime})\)

Motivation

The work also reveals inadequacies in the tour optimization algorithm, that may benefit from newly developed techniques and software tools. Exploring this area would help improve the guided tours. As new optimization techniques become available, adapting these to the guided tour would extend the technique to a broader range of problems. (Laa & Cook, 2020)

Continuation of the previous work

The Jellyfish search optimiser

Chou, J. S., & Truong, D. N. (2021). A novel metaheuristic optimizer inspired by behavior of jellyfish in ocean. Applied Mathematics and Computation, 389, 125535.

A diagram to explain the Jellyfish search optimiser

A snapshot of the optimiser in the R code (maybe need to explain the tourr code??)

An animation of JSO in action

Properties proposed

✅ smoothness, ✅ squintability, ❌ flexibility, ❌ rotation invariance, ✅ speed

Smoothness

what does it mean by smooth and not smooth

Squintability

A small squint angle because you have to be very close to the optimal projection plane to be able to see the structure of the data.

(we see a hill over there! Now we see a hill)

Define Squintability

Projection distance between two bases \(A\) and \(A^*\), \(d(A, A^*)\):

\[d(A, A^*) = \lVert AA^\prime - A^*A^{*\prime}\ \rVert _F\]

where \(\lVert . \rVert _F\) denotes the Frobenius norm, given by \(\lVert M \rVert _F = \sqrt{\sum_{ij} M_{ij}^2}\).


Index-distance curve \(g\) maps \(d(A, A^*)\) to the index value \(f(XA)\), such that \[g(d(A, A^*)) = f(XA)\]

Squintability:

\[\varsigma(f) = -c \times \max_{d} g'(d) \times \arg \max_{d} g'(d)\]

use c = 4 to be consistent with estimating with parametric model

Calculate squintability

sigmoid curve: \(\ell(x):=\frac{1}{1+\exp(\theta_{3}(x-\theta_{2}))}\ \)

parametric model: \(f(x)=(\theta_{1}-\theta_{4})\frac{\ell(x)-\ell(x_{\max})}{\ell(0)-\ell(x_{\max})}+\theta_{4}\ \)

Squintability: \(\varsigma=\frac{(\theta_{1}-\theta_{4})\theta_{2}\theta_{3}}{\ell(0)-\ell(x_{\max})}\)

Example:

shall I do a specific calculation example here?

Simulation setup

the “pipe” shape:

the holes index

data dimension d = 6, 8, 10, 12

  • different JSO hyper-parameters:

    • number of jellyfishes (20, 50, 100)
    • maximum number of tries (50, 100)

the “sine” shape:

index function: “MIC”, “TIC”, “dcor2d”, “loess2d”, “splines2d”, “stringy”

data dimension d = 6

For “MIC” and “TIC”, d = 8 is also included

Sucess rate

50 repetitions for each case to calculate success rate

xxx out of 50 that finds a final index value within 0.05 of the best index value found among all 50 simulations

The generalised linear model

a binomial family and a logit link function

success rate ~ smoothness + squintability + n_jellies + max_tries + d + long_time

data pre-processing:

  1. divide n_jellies and max_tries by 10 for interpretation,
  2. new binary variable long_time for average run time over 30 seconds
index d smoothness squintability n. jellyfish max. tries success rate time (sec)
MIC 6 2.46 1.26 20 50 0.12 2.48
MIC 6 2.46 1.26 20 100 0.24 8.95
MIC 6 2.46 1.26 50 50 0.52 5.65
MIC 6 2.46 1.26 50 100 0.64 13.22
MIC 6 2.46 1.26 100 50 0.76 19.45

Results

term estimate std.error statistic p.value
Intercept -4.52 5.33 -0.85 0.40
Smoothness 1.19 1.92 0.62 0.53
Squintability 2.06 0.68 3.01 0.00
Dimension (d) -0.63 0.26 -2.46 0.01
Long time -0.87 1.29 -0.68 0.50
N. jellyfish 0.22 0.13 1.70 0.09
Max. tries 0.11 0.15 0.75 0.45
  • The signs of the variables are as expected
  • The variable squintability and dimension are significant - sugguesting their influence the success of the optimisation

Takeaway

  • The JSO is implemented in the tourr package.

  • The two metrics, smoothness and squintability, are proposed and implemented in the ferrn package. They can be used to characterise the difficulty of the optimisation task and inform the choice of optimiser.

  • Large squintability indicates a distinct difference in index values between optimal regions and others. When one jellyfish enters the optimal region, the high index value it generates will be clearly distinguishable from spurious values and lead other jellyfish to move toward the optimal region.

But also notice …

  • The two metrics are relative - comparison should be made with same parameters.

  • Index values should be standardised to the [0, 1] interval

  • Computational cost associated with population-based optimiser, i.e. JSO

🔗